Model Selection

Image caption generation

# Image caption generation

Gemma 3 4b It Qat 4bit

Gemma 3 4B IT QAT 4bit is a 4-bit quantized large language model trained with Quantization-Aware Training (QAT), based on the Gemma 3 architecture and optimized for the MLX framework.

Transformers Other

Florence 2 Base Gpt4 Captioner V1

A GPT4-O style caption generator fine-tuned based on Florence-2-base-ft for generating image descriptions

Transformers Supports Multiple Languages

Llama 3.2 11B Vision Instruct Nf4

4-bit quantized version based on meta-llama/Llama-3.2-11B-Vision-Instruct, supporting image understanding and text generation tasks

This is a LORA fine-tuned version of the Qwen2-VL-2B model for Russian, supporting multimodal tasks.

Transformers Supports Multiple Languages

BLIP is a unified vision-language pretraining framework, excelling in tasks like image caption generation and visual question answering, with performance enhanced by innovative data filtering mechanisms

Florence 2 Large Ft

Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle various vision and vision-language tasks.

Paligemma Rich Captions

An image caption generation model fine-tuned on the DocCI dataset based on PaliGemma-3b, capable of generating detailed descriptions of 200-350 characters with reduced hallucination

Transformers English

Spydazwebai Image Projectors

An image-to-text model based on the Transformers library, capable of converting image content into descriptive text, particularly suitable for the art domain.

Image-to-Text Supports Multiple Languages

Uform Gen2 Qwen 500m

UForm-Gen is a small generative vision-language model primarily used for image caption generation and visual question answering.

Transformers English

Git Base One Piece

A vision-language model fine-tuned from Microsoft's git-base model, specifically designed to generate descriptive text captions for images from the anime 'One Piece'

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase